Skip to content

Add crashes_after_fix rule to flag fixed crash bugs still crashing on Nightly#2881

Open
spohlMozilla wants to merge 2 commits into
mozilla:masterfrom
spohlMozilla:crashes-after-fix
Open

Add crashes_after_fix rule to flag fixed crash bugs still crashing on Nightly#2881
spohlMozilla wants to merge 2 commits into
mozilla:masterfrom
spohlMozilla:crashes-after-fix

Conversation

@spohlMozilla
Copy link
Copy Markdown

The macOS and Windows Spotlight teams have repeatedly hit the same
workflow gap: a crash gets a speculative fix, the patch lands, the bug
is marked RESOLVED FIXED -- and the signature keeps firing on Nightly
after the build containing the fix has shipped. With nothing prompting
us to re-check crash-stats a few days post-landing, this verification
step gets skipped, and we have ended up discovering only much later
(in some cases weeks or months) that the speculative fix didn't
actually move the crash numbers.

This rule plugs that gap. Once a day it picks RESOLVED FIXED bugs where
cf_status_firefox_nightly is "fixed" and cf_last_resolved falls between
min_days_since_fix (default 4) and max_days_since_fix (default 10) ago,
runs a faceted Socorro SuperSearch over Nightly for the bug's
signature(s) starting the day after the fix landed, and -- if
min_crash_count (default 5) or more crashes have been recorded in that
window -- needinfos the assignee asking whether the fix was incomplete,
whether the signature is shared with a different underlying crash, or
whether a follow-up is needed.

The four-day floor gives the Nightly build containing the fix time to
roll out and accumulate user exposure before the bot will fire. The
rule skips bugs that already have any open needinfo, and also skips
bugs whose comment history contains the rule's marker phrase, so it
only pings the assignee once per fix.

… Nightly

The macOS and Windows Spotlight teams have repeatedly hit the same
workflow gap: a crash gets a speculative fix, the patch lands, the bug
is marked RESOLVED FIXED -- and the signature keeps firing on Nightly
after the build containing the fix has shipped. With nothing prompting
us to re-check crash-stats a few days post-landing, this verification
step gets skipped, and we have ended up discovering only much later
(in some cases weeks or months) that the speculative fix didn't
actually move the crash numbers.

This rule plugs that gap. Once a day it picks RESOLVED FIXED bugs where
cf_status_firefox_nightly is "fixed" and cf_last_resolved falls between
min_days_since_fix (default 4) and max_days_since_fix (default 10) ago,
runs a faceted Socorro SuperSearch over Nightly for the bug's
signature(s) starting the day after the fix landed, and -- if
min_crash_count (default 5) or more crashes have been recorded in that
window -- needinfos the assignee asking whether the fix was incomplete,
whether the signature is shared with a different underlying crash, or
whether a follow-up is needed.

The four-day floor gives the Nightly build containing the fix time to
roll out and accumulate user exposure before the bot will fire. The
rule skips bugs that already have any open needinfo, and also skips
bugs whose comment history contains the rule's marker phrase, so it
only pings the assignee once per fix.
@spohlMozilla
Copy link
Copy Markdown
Author

@suhaibmujahid would you mind taking a look when you have a moment? I couldn't add you as a formal reviewer (external-contributor permissions). Thanks!

@marco-c marco-c requested a review from suhaibmujahid May 18, 2026 08:52
@marco-c
Copy link
Copy Markdown
Contributor

marco-c commented May 18, 2026

Given we already have min_crash_count and we can look specifically at Nightly builds after the fix, we don't really need the 4 days delay.

Per marco-c's review feedback: min_crash_count plus the
"date >= fix_date + 1 day" Socorro filter already gate pings, so the
4-day floor was redundant. Removing it means the rule fires as soon as
the threshold is crossed -- fast-burning regressions get caught earlier,
slow-burning ones are still gated by min_crash_count. max_days_since_fix
is kept as the upper bound on how long we keep polling a bug whose
crash count is still below the threshold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants